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Subtree-Prune-Regraft Graph* 
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Abstract 

Statistical phylogenetic inference methods use tree re¬ 
arrangement operations such as subtree-prune-regraft 
(SPR) to perform Markov chain Monte Carlo (MCMC) 
across tree topologies. These methods are known to mix 
quickly when sampling from the simple uniform distri¬ 
bution of trees but may become stuck in the local op¬ 
tima of multi-modal posterior distributions for real data 
induced by non-uniform likelihoods. The structure of 
the graph induced by tree rearrangement operations is 
an important determinant of the mixing properties of 
MCMC, motivating study of the underlying rSPR graph 
in greater detail. 

In this paper, we investigate the rSPR graph in a 
new way: by calculating Ricci-Ollivier curvature with 
respect to uniform and Metropolis-Hastings random 
walks. We confirm using simulation that mean access 
time distributions depend on distance, degree, and cur¬ 
vature, showing the relevance of these curvature results 
to stochastic tree search. These calculations require 
fast new algorithms for constructing and sampling these 
graphs, reducing the time required to compute an rSPR 
graph from 0(m^n)-time to 0{mnP), where m is the 
(often large) number of trees in the graph and n their 
number of leaves, and reducing the time required to se¬ 
lect an SPR neighbor of a tree uniformly at random 
to 0{n) time. We then develop a closed form solution 
to characterize how the number of SPR neighbors of a 
tree changes after an SPR operation is applied to that 
tree. This gives bounds on the curvature, as well as 
a flatness-in-the-limit theorem indicating that paths of 
small topology changes are easy to traverse. However, 
we find that large topology changes (i.e. moving a large 
subtree) gives pairs of trees with negative curvature. Al¬ 
though these pairs of trees with negative curvature do 
not impede mixing in this simple well-connected space, 
they may manifest as bottlenecks in the much smaller 
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credible sets induced by phylogenetic posteriors with a 
likelihood function. This work extends our knowledge 
of the rSPR graph, in particular properties that are rel¬ 
evant for investigation of sampling the rSPR graph. 


1 Introduction 

Molecular phylogenetic methods reconstruct evolution¬ 
ary trees from DNA or RNA data and are of funda¬ 
mental importance to modern biology. Statistical phy¬ 
logenetics is the currently most popular means of re¬ 
constructing phylogenetic trees, in which the tree is 
viewed as an unknown parameter in a likelihood-based 
statistical inference problem. The likelihood function 
in this setting is the likelihood of generating the ob¬ 
served sequences via a continuous time Markov chain 
(CTMC) evolving down the tree starting from a se¬ 
quence assumed to be sampled from the stationary dis¬ 
tribution [^. The lengths of the branches of the phy¬ 
logenetic tree give the “time” parameter in the CTMC, 
where the generated sequence accrues mutations, typi¬ 
cally in an IID manner across sites. It is now common 
for researchers to approximate the posterior distribution 
of trees and their associated parameters in a Bayesian 
setting using Markov chain Monte Carlo (MCMC). 

In order to estimate these distributions accurately, 
MCMC samplers must sufficiently explore the set of 
trees. Phylogenetic search algorithms typically attempt 
to do so through a combination of modifications to the 
continuous parameters and tree topology. Topology 
changes have been identified as the main limiting fac¬ 
tor of Bayesian MCMC algorithms 13 16 , as other pa¬ 


rameters cannot be accurately estimated if the topology 
distribution is not accurately sampled. Commonly used 
phylogenetics software packages such as MrBayes 129 


and BEAST rearrange subtrees via subtree-prune- 
regraft (SPR) moves (Figure [l(d)[ ) or the subset of SPR 
moves called nearest neighbor interchanges (NNI) [^ . 
Thus, phylogenetic searches can be viewed as travers¬ 
ing the SPR graph: the graph with phylogenetic trees 
as vertices and SPR adjacencies as edges. 

It has become increasingly clear that the structure 
of the SPR graph plays an important role in determin¬ 
ing the accuracy of tree searches. Researchers have pre- 









viously identified slow mixing in MCMC with patho- graphs including the internet topology 24 and cancer 
logical data 22 23 28 . On the other hand, fast mix- networks 31 . 


ing has been identified with exceptionally well-behaved 
data 38 or with a uniform distribution 34 . Studies 
on real data [2 16 , however, have identified posteriors 
which are difficult to sample using MCMC. Previously, 
the lack of sufficient computational tools for examin¬ 
ing phylogenetic posteriors in terms of SPR operations 
made it difhcult to determine the cause of these difh- 
culties. By developing the first such tools, we recently 
showed that graph structure has a significant effect on 


MCMC mixing with MrBayes applied to real data 44 


and that multimodal posteriors are common and sepa¬ 
rated by “bottlenecks” of specific classes of SPR moves. 

Although the SPR graph is thus very important 
in determining the success of phylogenetic inference 
procedures, still little is known about the rooted or 
unrooted versions of the SPR graph itself. developed 
a recursive procedure on a tree to find the degree of the 
corresponding vertex in the rooted SPR (rSPR) graph, 
and corresponding bounds on degree. showed that 
the diameter A^spr of the rSPR graph is n — 
and for the unrooted case they show 


(1) n-2\y/n]+l < Auspr(?^) < n-3- 


Vn^-1 


We are not aware of any further work investigating 
properties of the SPR graph, which may be due to 
its complexity. Indeed, even computing the distance 
between topologies in terms of SPR operations (rooted 
and unrooted) is NP-hard 3p2 . Fortunately, it is fixed- 
parameter tractable with respect to the distance in the 
rooted case and efficient fixed-parameter algorithms 
have recently been developed 43p4 and begun to allow 
such investigation. 

Ollivier and colleagues recently pioneered a new 
approach to calculating Ricci curvature on a general 
type of metric space, including graphs 15 25 . In this 


framework, local information about the metric space 
is given by a random walk (rather than a Riemann 
tensor) such that their notion of curvature formalizes 
the notion of to what extent random walking brings 
points together. Applying the framework to Brownian 
motion on a manifold returns the classical definition of 
Ricci curvature. Curvature is determined by the ratio of 


the earth mover’s distance 30 between neighborhoods 


of a pair of vertices given by a random walk and 
the distance between the vertices. Here the term 
random walk on a space X simply denotes a family 
of probability measures parameterized by points of 
X satisfying reasonable assumptions, which includes 
biased walks such as MCMC. This approach has been 
useful for determining properties of a wide variety of 


In this paper, we investigate curvature of the rSPR 
graph with respect to two random walks and compare 
those results to access times (i.e. hitting times) for 
those random walks. Our explicit focus here is to 
investigate random walks defined only in terms of the 
graph itself: the uniform random walk and MCMC 
sampling from the uniform prior on trees. In future 
work, we will extend these methods to study more 
complicated distributions with non-uniform topology 
probabilities. 

We required several new computational tools. We 
present a fast new algorithm for computing rSPR graphs 
from a set of trees, reducing the time to do so from 
O(m^n) to 0{mn^) for a set of m trees with n leaves. 
As the full rSPR graph on trees with n leaves contains 
(2n — 3)!! = 3 • 5 •... • (2n — 3) trees, this is a significant 
improvement in practice for exploring large subsets 
of the graph (or, as we do here, the full graph for 
small numbers of leaves). By exploiting symmetries in 
the rSPR graph, we were able to calculate all of the 
curvatures for pairs of trees with up to seven leaves. 
By carefully examining the overlap in rSPR moves, 
we present a new method for computing the degree 
of a tree in the rSPR graph that allows one to select 
an rSPR neighbor uniformly at random in linear-time 
without explicitly generating the graph. This stands 
in contrast to the sampling methods used in current 
software such as MrBayes, which do not propose SPR 
moves uniformly. 

Using our methods to simulate these random walks, 
we found that the distribution of access times between 
pairs of trees can be described by distance between the 
trees, the degrees of the trees, and the curvature. More¬ 
over, we found that rSPR graphs for trees with 7 or 
more leaves have tree pairs with negative curvature, 
corresponding to direct paths that are difficult to tra¬ 
verse stochastically. By getting a more fine-tuned un¬ 
derstanding of the rSPR neighborhood of pairs of ver¬ 
tices, we are able to give bounds on the earth mover’s 
distance in this context and thus curvatures under these 
random walks. In particular, we present a full charac¬ 
terization of the change in rSPR degree that occurs from 
a given rSPR move and find that even though they each 
count as one move, rSPR moves which modify large sub¬ 
trees are less likely to be explored during these random 
walks. Pairs of trees separated by such moves corre¬ 
spond to the pairs with negative curvature identified in 
our simulation results. These pairs occur infrequently 
in these well-connected graphs, however, they may be 
more problematic in real posterior distributions where 
the majority of probability is spread over a relatively 
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Figure 1: (a) An A-tree T. (b) T{V), where V = {1,2,5}. (c) T\V. (d) An rSPR operation transforms T into a 
new tree T' by pruning a subtree and regrafting it in another location. 


small number of trees 44 . In summary, we extend 


knowledge about an important graph for phylogenetics, 
specifically in a way that models phylogenetic MCMC 
search. 

The automated computational analysis code can be 
found at https;//github.com/matsengrp/curvature 
Proofs of our theorems and lemmas can be found in the 
appendix. 


2 Preliminaries 

We follow the definitions and notation from [3,43 44 
A (rooted binary phylogenetic) A-tree is a rooted tree 
T whose nodes have zero or two children such that the 
leaves of T are bijectively labelled with the members of a 
label set A. As in [3 43 44 , the tree is augmented with 
a labelled root node p and p is considered a member 
of A (Fig. |l(a)| ). We generally use n to refer to the 
number of leaves in an A-tree. For a subset V of A, 
T(V) is the smallest subtree of T that connects all 
nodes in V (Fig. |l(b)[ ). The F-tree induced by T is 
the smallest tree T\V that can be obtained from T{V) 
by suppressing unlabelled nodes with fewer than two 
children (Fig. 1(c)). For the rest of the paper, we will 


assume that all phylogenetic trees are binary and 
rooted, and that tree inclusion is rooted tree inclusion. 
A parent (sub)tree of a subtree U is the smallest 


subtree strictly containing U. A parent edge of a subtree 
U is the edge connecting U to the rest of the tree. The 
internal edges of a tree are the edges that do not contact 
a leaf or p. A ladder tree (also known as a eaterpillar 
tree) is a tree such that every internal node has a leaf 
as a direct descendant. A balanced tree is a tree such 
that the sum of the depths of internal nodes is minimum 
over all trees with the same number of leaves. The least 
common ancestor (LCA) of a set R of two or more nodes 
is the unique node that is an ancestor of each node r G R 
and at maximum depth. Similarly, the LCA of two or 
more subtrees is the LCA of their parent nodes. 

A (rooted) subtree-prune-regraft (rSPR) operation 
on an A-tree T cuts an edge e = {x,px) where Px 
denotes the parent of node x. T is divided into two 
subtrees Tx and Tp^ containing x and Px^ respectively. 
Then the operation adds a new node to Tp^ by 
subdividing an edge of Tp^ and adding a new edge 
(x,p'^), making x a child of p^. Finally, px is suppressed, 
joining the two edges on either side of that node. See 
Figure |l(d)| for an example. The inclusion of p allows 
for rSPR moves which move subtrees to the root of the 
tree. 

rSPR operations give rise to a distance measure 
between A-trees: dsPR(7i,T2) is the minimnm number 
of rSPR operations required to transform an A-tree Ti 
into T 2 . For example, the trees in Figurej^are separated 
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Figure 2: Two rSPR operations, each of which moves one grey subtree. The leftmost and rightmost trees are 
rSPR distance two apart. 


by two rSPR operations. Moreover, rSPR operations 
naturally give rise to a graph on the set of X-trees for 
which this distance is simply the shortest-path graph 
distance. Let Tn be the set of trees with n leaves and 
label set X = {1,2,... n, p}. Then the rSPR graph G 
of Tn is the graph with vertex set V (G) = Tn and edge 
set E{G) = {(T, S) I dsPRiT, S) = 1 ,TgV,S€ F}. 

To avoid confusion between the two types of graph 
structures considered here, we refer to vertices of the 
rSPR graph as vertices and vertices of individual trees 
(i.e. leaves and internal nodes) as nodes. Let N{T) be 
the set of rSPR neighbors of a tree T (this does not 
include T). For example, the tree T with 4 leaves in 
Figure has 10 neighbors. We say that the degree of 
T is |A^(T)|, that is, the number of trees which can be 
obtained from T by a single rSPR operation. We assume 
that all trees are bifurcating, and thus use degree to refer 
only to the degree of rSPR graph vertices. 

Ricci-Ollivier curvature provides a rigorous yet in¬ 
tuitive formalization of the shape of a metric space with 
respect to a random walk. For the purposes of this 
paper, we will specialize to that space being a graph 
equipped with the shortest-path distance. For a more 
rigorous presentation in the more general setting of a 


26 


Polish metric space, see 25 or the survey 

Let rrix and ruy be probability densities of the 
position of a specified random walk after one step of the 
random walk, starting at points x and ?/ of a graph G = 
(F, G), respectively. The transportation distance 37 
(equivalently Wasserstein distance, or “earth movers 
distance” 1301) between rrix and rriy is the minimum 


amount of “work” required to move 
edges of the graph, that is 


to rriy along 


(2) Wi{mx,my):= min V d{z,w)^{z,w), 

{z,w}CV 


where d{z, w) is the graph shortest-path distance 
(dspniz, w) in our case) and Il{mx,my) is the set of 
densities on F x F that are nix after projecting on the 
first component and my after projecting on the second. 


The coarse Ricci-Ollivier curvature of x and y is 
then defined as: 


( 3 ) 


i{m;x,y) := 1 - 


VFi(r 




d{x,y) 


For the purposes of this paper, “curvature” without fur¬ 
ther specification will refer to (|^. We will use K{x,y) 
to denote the curvature of the simple (uniform choice of 
neighbor) random walk, and use k(MH; x, y) to indicate 
curvature with respect to the Metropolis-Hastings ran¬ 
dom walk sampling the uniform distribution (described 
in detail in Section 3.21. Positive curvature implies that 


the neighborhoods and my are closer in transporta¬ 
tion distance than point masses at x and y, zero curva¬ 
ture implies that they are neither closer nor farther, and 
negative curvature implies that mx and my are more 
distant than point masses at x and y. Curvature thus 
provides an intuitive measure of the difficulty of moving 
between regions of the graph with a random walk. 



Figure 3: The neighborhood of an X-tree T with 4 
leaves, showing connections between neighbors. 





















Lin et al. 17 defined a variant definition of cur¬ 


vature in terms of lazy random walks which Loisel and 


( 4 ) 


i'ic{m;x,y) := lim 

p —^0 


Kp{m-,x,y) 


P 


As above for k, we use ric(a;, y) as shorthand for 
Tic{m;x,y) when m is the uniform lazy random walk, 
and ric(MH; cc, y) when m is the Metropolis-Hastings 
random walk sampling the uniform distribution (Sec¬ 
tion 3.2). This definition of curvature is invariant of p 


for small p and can be used to avoid parity problems 
on graphs where the uniform random walk is periodic 
without choosing a specific laziness parameter (e.g. 01- 
livier often considered ki ( x , y) for this purpose). As we 
prove in Lemma |6.7[ the notions of coarse and asymp¬ 
totic curvature differ only by a small factor bounded 
by max(|Af(x)|,|Af(y)|) between adjacent vertices and are 
equal for nonadjacent vertices. 


time for unrooted trees) using the algorithms of Whid- 
den et al. 42-^. We applied this method comparing 


Romon 18 dubbed the asymptotic Ricci-Olivier curva¬ 
ture. The lazy random walk only travels according to 
rrix with probability p and otherwise stays put. Thus 
the lazy mass assignment is the sum of p mx and a 
point mass of 1 — p on x. We denote the coarse curva¬ 
ture of the p-lazy random walk between two vertices x 
and y with respect to a random walk m by Kp{m; x, y). 
For example, Ki/ 4 (m; x, y) describes the curvature of the 
lazy random walk that follows the given random walk m 
with probability 1 /4 and remains stationary with prob¬ 
ability 3/4. The asymptotic Ricci-Ollivier curvature of 
X and y is then: 


each of the m trees pairwise to identify adjacencies, re¬ 
quiring a total of 0(TO^n)-time (O(m^n^)-time in the 
unrooted case). However, this method is impractical 
when applied to construct graphs with 7 or more leaves, 
due to the rapidly growing 0{m?) factor. 

The key to our efficient algorithm for quickly com¬ 
puting dense rSPR graphs (those containing a signifi¬ 
cant portion of the full rSPR graph) lies in avoiding the 
pairwise comparison of non-adjacent trees and thereby 
shaving off an 0(m) factor. The input to our algo¬ 
rithm is a set T of phylogenetic trees in the 0(n)-length 
Newick 46 representation of each tree as a string. 


These representations are made unique by ordering each 
tree so that leftmost subtrees contain the smallest al¬ 
phanumeric label of descendants. We construct a map¬ 
ping from each tree Ti to its order index in this list i. 
Begin with an empty graph G. For each tree T^, we first 
add a vertex i to the graph and then use Corollary |3.4| 
below to enumerate the O(n^) neighbors of Ti in the 
rSPR graph in 0(n^)-time. This efficient enumeration 
procedure is the key step required to achieve our de¬ 
sired running time of Olran^). We use the tree to index 
mappings to determine whether these trees are already 
vertices of the graph and, if so, add an edge in the graph 
from Ti to each such neighbor Tj. The high-level steps 
are as follows, and we show in Theorem 13.11 that this 


algorithm is correct and can be implemented to run in 
the stated time. 


3 Efficient algorithms for computing and 
sampling rSPR graphs 

3.1 Computing the rSPR graph of m trees with 
n leaves in 0(TOn^)-time. It is necessary to have an 
efficient method of constructing the full rSPR graph 
for a fixed number of leaves in order to study it. 
The previous best algorithm for this problem requires 
O(m^n) time, where m is the number of trees in the 
graph and n the number of leaves [44| . Here we reduce 
that time to O(mn^). Note that for the full rSPR graph, 
m is the rapidly growing function (2n — 3)!!, that is, 
3 • 5 • ... • (2n — 3), and this is therefore a significant 
improvement in practice, as we demonstrate below. 

we constructed (unrooted) 


In previous work 44 


SPR graphs from subsets of m high probability trees 
sampled from phylogenetic posteriors to compare mix¬ 
ing and identify local maxima. Although the SPR 
distance (rooted and unrooted) is NP-hard to com¬ 
pute [^[T^, it is fixed-parameter tractable with respect 
to the distance in the rooted case [^. In particular, one 
can determine in 0(n)-time whether two rooted phylo¬ 
genetic trees are adjacent in the rSPR graph (0(n^)- 


Construct-rSPR-Graph(T) 

1. Let G be an empty graph. 

2. Let M he a mapping from trees to integers. 

3. Let i = 0. 

4. For each of the m trees: 

(a) Add a vertex z to G representing the current 
tree T^. 

(b) Add Ti ^ i to M. 

(c) For each of the O(n^) neighbors of Ti, 
enumerated using Enumerate-rSPR- 
NEIGHBORS(rj): 

i. If the current neighbor Tj is in M then 
add an edge (i, M[Tj]) to G. 

(d) i = i -\-l. 


Theorem 3.1. The subgraph of the rSPR graph in¬ 
duced by a set T of m trees with n leaves can be con¬ 
structed in 0{mn^)-time. 

We implemented this procedure in the C-I-+ 
program dense_spr_graph of the software package 












spr_neighbors 


40 


which outputs an edge list for¬ 
mat graph suitable for input to other software. The 
construction procedure reduced the time required to 
compute the 10,395-vertex 7-taxon rSPR graph from 
2,104.68 seconds to 12.71 seconds on an Intel Core 2 
Duo E7500 desktop running Ubuntu 14.04. Moreover, 
although we do not study the 135,135-vertex 8-taxon 
rSPR graph in this paper, our algorithm required only 
303.45 seconds to construct it on the same hardware. 
Constructing the 8-taxon rSPR graph using the pre¬ 
vious method required 377,395 seconds (more than 4 
days), and thus that method is infeasible for construct¬ 
ing larger rSPR tree graphs. Thus, we believe our fast 
graph construction procedure will itself be useful for 


further studies of rSPR graph subsets similar to 44 


as the algorithm can quickly construct rSPR graphs for 
any given subset of trees. 


3.2 Simulating random walks on the rSPR 
graph. The uniform random walk moves from one ver¬ 
tex to one of its neighbors uniformly at random, which 
makes this walk more likely to sample higher degree ver¬ 
tices. In contrast, the Metropolis Hastings (MH) ran¬ 
dom walk with constant likelihood function proposes a 
move from a tree T to a neighbor tree S uniformly at 
random and then accepts the move according to the 
Hastings ratio, min ^1, ^ . The MH random walk 

is guaranteed to sample each tree uniformly at random 
and is therefore representative of a phylogenetic MCMC 
program sampling trees under a uniform prior. 

To efhciently simulate the MH random walk, we 
developed a linear-time algorithm for proposing rSPR 
moves that does not require the rSPR graph to be 
explicitly built and stored in memory. A naive approach 
would require O(n^) time: 0(n) time to generate each 
of the O(n^) neighbors of a given tree so that one 
could be picked uniformly at random. To eliminate an 
O(n^) factor, we developed a deterministic ordering of 
rSPR moves with a one-to-one correspondence to rSPR 
neighbors, as described in the next paragraph. Given 
such an order, a uniform neighbor can be selected by its 
index in 0(n) time. We note that the recursive formula 
of Song for the degree of a tree does not group 
rSPR moves that move a particular subtree, and thus 
would still require O(n^) time to select a specific rSPR 
neighbor by index. 

We consider the distribution of rSPR moves in 
terms of the number of nodes contained within a sub¬ 
tree. Recall that a tree with n leaves has 2n — 1 total 
nodes (ignoring the artificial p node). Given a subtree R 
with X nodes, observe that there are 2n—l — x possible 
locations to regraft R. However, some of these moves 
will result in the same neighboring tree as other rSPR 


moves. In particular, where we call the edge connecting 
the subtree rooted at that node to the rest of the tree 
the “node’s edge”, we have: 

i. Moving R to its sibling edge results in the same 
tree, not a neighboring tree, 

ii. Moving R to its parent edge results in the same 
tree, 

iii. Moving R to its grandparent edge is the same as 
moving its aunt to its sibling edge, and 

iv. Moving R to its aunt edge is the same as moving 
its aunt to R's edge. 

We prove in Lemma [3^ that this list is exhaustive, that 
is each other pair of R and destination edge e results in a 
unique rSPR neighbor. We assign {2n—l — x) — 2 moves 
to children of the original non-p root (lacking both an 
aunt and a grandparent), and (2n — 1 — a:) — 4 moves 
to each other non-root node. Let N{T,u) denote the 
neighbors of T assigned to node u, obtained by moving 
the subtree R rooted at u. We thus achieve a new 
method for computing the neighborhood size: 

Lemma 3.2. For a tree T with n leaves, 

\N{T)\ = Y,\N{T,u)\, 

uGT 

for nodes u of T, where N{T,u) is as defined above, 
and: 


{ 2n — X — 5 if depth(u) > 1, 

2n — x — 3 if depth(u) = I ■ 

0 if depth (u) <Q 

In particular, this formulation implies a total or¬ 
dering of rSPR moves such that every move moving the 
same subtree R forms a contiguous subsequence. We 
can thus apply the following algorithm to select a neigh¬ 
bor uniformly at random for a tree T: 

SELECT-RSPR-NEIGHBOR(r) 

1. Gompute the degree of T, | A^(T) | using Lemma [3(2| 

2. Pick a random integer r in the range [1, |A^(T)|]. 

3. Label each node m of T by its preorder number and 
compute the number of nodes in the subtree rooted 
at each u. 

4. For each tree node u and while r > 0: 

(a) Decrease r by |A^(T,u)|. 

(b) If r < 0, let S be the jr| member of N{T,u) 
and terminate the for loop. 

5. Return the neighbor S. 
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Figure 4: Scatter plot of k(MH; Ti, T 2 ) values versus (isPR(7"i) T 2 ) for the rSPR graph. Color displays the average 
degree of Ti and T 2 . Distance values randomly perturbed (“jittered”) a small amount to avoid superimposed 
points. 


Lemma 3.3. An rSPR neighbor of a tree T can be 
chosen uniformly at random in 0(ri)-time using 0{n) 
space. 

Observe that this procedure can be easily adapted 
to explore the full neighborhood of a tree in O(n^) time, 
which we use for Theorem |3.1[ We call the resulting pro¬ 
cedure Enumerate-rSPR-Neighbors(T). We thus 
have the following corollary: 


Corollary 3.4. The rSPR neighbors of a tree T can 
be enumerated in 0{n^)-time. 


We implemented this procedure in the C-I-+ pack- 

We sampled a 200,000- 
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age random_spr_walk 
iteration random walk on the 4-leaf rSPR graph and a 
50,000-iteration random walk on the 5-leaf rSPR graph. 


4 Access times of random walks on the rSPR 
graph can be understood using distance, 
degree, and curvature 


4.1 Computing curvature values. To compute 
curvature values, we first used dense_spr_graph to com¬ 
pute the rSPR graph for four to seven leaves, as dis¬ 
cussed in Section [XT] We then computed curvatures for 
given pairs of trees directly, by using linear program¬ 
ming 18 to compute the minimal mass transport Wi 
using the SAGE 35 front-end to the GLPK solver; 


code can be found in 
described in 


18 


20 which grew from the code 


This would have required an enormous amount of 
computation to directly compute curvatures for the 
((2n — 3)!!)^ pairs of trees with n leaves, even for the 


small values of n we consider here. We instead exploited 
the fact that pairs of trees which are equivalent modulo 
label renumbering are symmetric in the rSPR graph and 
therefore guaranteed to have the same curvature. For 
example, the pairs {(((1, 2), 3), 4), ((1, 2), (3,4))} and 
{(((1,4), 2), 3), ((1,4), (2,3))} are the same after relabel¬ 
ing, so their curvatures are the same. We thus directly 
computed curvature values for one representative pair 
from each such equivalence class, or tanglegram 36 


the group-theoretic enumeration methods are described 
in a manuscript in preparation, and the SAGE 35 and 
GAP4 code is at [2^ . 

We find a wide variation in curvature among tan- 
glegrams (Figure [^. Gurvature values tended to in¬ 
crease with increasing rSPR distance, and their vari¬ 
ance decreased with increasing distance. Neighboring 
trees achieved minimum curvature values for a given 
number of leaves, and we found maximum curvature 
values between trees at maximum distance or one rSPR 
move closer than the maximum. This suggests that the 
increased difficulty of moving between trees with a ran¬ 
dom walk due to distance may be tempered somewhat 
by larger curvature in the highly connected rSPR graph. 

Larger rSPR graphs tended to have smaller curva¬ 
ture values. Indeed, the 7-leaf rSPR graph contained ad¬ 
jacent pairs of trees with negative curvature. Such pairs 
indicate difficult paths for phylogenetic searches, which 
may be exacerbated by likelihood or branch length con¬ 
straints. 


4.2 Access time simulation. The access time for a 
pair of vertices in a graph is the (random) number of 



















Figure 5: Distribution of rSPR MH access times for those pairs of 5-taxon trees with degree 24 that are not simple 
inclusions of 4-taxon pairs of trees. Color signifies rSPR distance between the trees, with green, orange, and blue 
signifying distances of 1, 2, and 3, respectively; the saturation of the color shows coarse curvature k(MH;-,-), 
such that increased saturation (i.e. darker color) indicates a smaller k. 


Table 1: p-values for ordinary least squares linear multi¬ 
ple regression of rSPR mean access time against degree 
and distance (two-tailed t-test of regression coefficient). 
The p-values for 7 taxa are smaller than the machine 
precision used to calculate them. 


variable 

5 taxa 

6 taxa 

7 taxa 

Ti degree 

2.425e-07 

2.726e-55 

0 

T 2 degree 

0.04367 

4.302e-21 

0 

drSPR 

5.026e-09 

1.104e-44 

0 


Table 2: p-values for ordinary least squares linear 
multiple regression of rSPR (5i against degree, distance, 
and K (two-tailed t-test of regression coefficient). 


variable 

5 taxa 

6 taxa 

7 taxa 

Ti degree 

9.376e-05 

2.944e-07 

5.51e-09 

T 2 degree 

0.2366 

0.1432 

0.1687 

drSPR 

5.151e-06 

0.0007557 

3.276e-23 

k(MH) 

4.462e-06 

1.436e-22 

1.459e-46 


iterations required to go from one of the vertices to the 
other in a random walk [19| ; we were interested in the 
connection between curvature and access time. In pre¬ 
vious work, we computed mean access times (MAT) be¬ 
tween pairs of trees in MCMC random walks: the mean 
number of iterations required to move from one tree to 
the other. We applied this work to demonstrate the 
influence of SPR graph structure on real MCMC poste¬ 


riors sampled with MrBayes 44 using sprspace 41 . 

Here, to gain more insight, we used simulation to 
approximate the entire access time distribution. Again 
we use the insight that the access time for a pair of 
trees with a simple random walk does not depend on 
the actual labeling of those trees, but rather only on 
their relative labeling. Thus rather than enumerate 
access times between trees, which would have required 
a tremendous amount of memory and computational 
power to obtain accurate estimates, we enumerate times 
between pairs of trees in a tanglegram. To calculate the 
empirical distributions of access times we aggregate all 
access times for the same tanglegram using our group- 


theoretic methods [^ . 

We find that the mean access time between trees Ti 
and T 2 is determined by |A^(ri)| and |A^(T 2 )| (Table[^. 
Furthermore, plotting the distribution of access times 
between pairs of trees with respect to their distance 
and curvature hints that smaller k slightly shifts the 
distribution of access times towards larger access times 
(Fig. |5(a)1 . We quantify this effect by defining to be 
the difference between the first pair of access time counts 
such that the second entry in the pair is nonzero. For 
example, (5i for distance 1 pairs (green lines in Fig. is 
the count for time 1 minus the count for time 2, while 
for distance 3 pairs (blue lines in Fig. is the count 
for time 2 minus the count for time 3. Regression finds 
a clear influence of k on (5i (Table [^. This confirms 
the intuitive interpretation of k(Ti,T 2 ) as quantifying 
the propensity of a random walk to go from Ti to T 2 
relatively directly, certainly before the random walk 
achieves stationarity. On the other hand, if the random 
walk starting from Ti does not quickly arrive at T 2 and 
instead achieves stationarity, the original position of the 






















random walk is forgotten, and the access time is then a 
standard exponentially distributed waiting time for an 
event in a Poisson process (Fig. |5(b)] ). 

The analysis can be reproduced by invoking the 
SCons (http: //scons . org/) build tool and running the 
cells in an IPython notebook; instructions are in the 
repository README file. 

5 Rooted SPR Neighborhoods 

Having made the connection between curvature values 
and access times on rSPR graphs, we now consider cur¬ 
vature theoretically. We begin by bounding differences 
between degrees, and then continue by considering fea¬ 
tures relevant to the earth mover’s distance that we call 
“squares” and “triangles” in the rSPR graph. Many of 
our results in this section follow from a characterization 
of the change in degree and distribution of permissible 
rSPR moves after an rSPR move is applied. 

Lemma 5.1. (Song |32| ) For a tree T with n leaves: 

i. |Af(T)| = 3n^ — 13n -|-14, ifT is a ladder tree, 
n. |iV(T)| = 4(n -2)2-2 YZX + 1)J , if T ^s 

a balanced tree, and 

Hi. 3n2 _ I3n -p 14 < |Ar(T)| < 4(n - 2)^ - 

2 Uog 2 (^ + 1)J ^ otherwise. 

We now bound the ratio and difference of rSPR 
degree between two trees with n leaves. 

Lemma 5.2. Let T,S he trees with n > 3 leaves, and 
assume w.l.o.g. that |A^(T)| < |A^(S')|. Then: 

*• jwilyl ^ 3/4, and 
a. |//(S')| - \N{T)\ <n^-5n + 6. 

We can improve these bounds in the case of adjacent 
trees. To do so, we require the following lemma that 
characterizes how the degree of a tree changes after an 
rSPR operation. See Figure for an illustration. 

Lemma 5.3. Let T and S be trees such that S can be 
obtained from T by moving a subtree R with k leaves 
from its position adjacent to subtree U to a location 
adjacent to subtree V. Let L be the LCA(U,V) in T. 
Let a be the number of intermediate nodes on the path 
from the parent of R to L in T, excluding endpoints. 
Similarly, let b be the number of intermediate nodes on 
the path from V to L in T, excluding endpoints. Let i be 
the number of leaves in U and j be the number of leaves 
in V, excluding any leaves of R. Then the degrees of T 
and S differ by: 

2 {k{a — b) + i — j). 


Moreover, we can use these ideas to determine the 
number of rSPR moves that are, in some respects, 
independent of a given rSPR move. That is, for two 
trees S and T differing by a single rSPR move, we wish 
to know the number of rSPR moves that are applicable 
to both trees rather than unique to one of the trees. To 
formalize this concept, consider pairs of trees T' G N{T) 
and S' G S{T) such that dgpniT', S') = 1. The number 
of such “squares” involving two adjacent trees will play 
a key role in our curvature bounds, as they push the 
curvature of those trees towards 0. 

Corollary 5.4. Continuing with the setting and no¬ 
tation in Lemma [5.31 at least 

7 := deg(T) — 2kb — 2(j — 1) = deg(S') — 2ka — 2{i — 1) 

trees in the neighborhood of T can be paired with o 
trees in the neighborhood of S such that the pairings 
are disjoint and dspB.{T', S') = 1 for each [T', S') pair. 

We can now use Lemma |5.3| to improve the bounds 
in Lemma |5.2| for two adjacent trees. 

Lemma 5.5. Let T, S be trees with n > 3 leaves, s.t. 
|Ar(r)| < \N{S)\ and dsPR(T,S') = 1. Then: 

i. |7V(5)| - |fV(r)| < 2L^J r’^1 < i(n - 2)2, 

> i; Vn > 4, and 

Hi lim - 6 

tot,. |jv(5)| — 7' 

Next, we bound the number of neighbors shared by 
two adjacent trees. The number of such “triangles” in¬ 
volving two adjacent trees has a key role in determining 
whether their curvature is positive or negative. 

Lemma 5.6. Let T and S be trees such that 
dspp{T, S) = 1. Then \N{T) n N{S)\ < 6n - 17. 

6 Curvature 

We now consider properties of the uniform (a.k.a. 
isotropic) random walk on the n-leaf rSPR graph. 
Recall that the uniform random walk begins at a tree T 
and moves to a tree uniformly at random from N{T). 
Recall that the coarse uniform random walk curvature 
between two trees T and S is k{T, S') := 1 — , 

where Wi^n is the mass transport term ([^. For the 
uniform random walk, mp is the probability measure 
assigning a mass of pv(T)J neighbors. Our 

results follow from the lemmas of Section [S] 

Theorem 6.1. Fix a positive integer k and let R be a 
tree with k leaves. Let {T„ \ n > k} be a sequence 
of trees all containing R, and let {S„ \ n > k} be the 
same sequence r„ but with R cut off and attached at a 
different location. Then lim„_>.oo k(T„, S„) = 0 for the 
uniform random walk on the rSPR graph. 











T S' 

Figure 6: An rSPR move labelled as in Lemma [5.3[ Moving the grey subtree R from its position adjacent to U 
in tree T to its position adjacent to V in tree S changes the rSPR degree by 2 {k{a — b) + i — j). 


Next we note a simple and rough bound on the 
curvature of two trees with respect to their distance, 
then obtain a tighter bound on the maximum curvature 
of two adjacent trees. 

Lemma 6.2. Let T and S be two trees. Then: 

dsvK{T,S)~ ^ ^ ~ dsvK{T,S) 

Lemma 6.3. The maximum curvature between two ad¬ 
jacent trees with n leaves is • 

This bound is tight and has been verified computa¬ 
tionally for n < 7. 

It is more difficult to obtain a closer bound on the 
maximum curvature of nonadjacent trees. Lemma |6.2| 
suggests that more distant pairs of trees should have 
smaller curvatures than close trees as neighborhood 
effects decrease with respect to the increasing distance. 
However, our experiments with n < 7 suggest that 
maximum curvature tends to increase with distance 
(with respect to a fixed n), as a far greater fraction of the 
neighbors approach each other as the distance increases. 
Indeed, for 5 < n < 7 the maximum curvature is 
obtained by pairs of trees at one less than the maximum 
distance. Moreover, nearly all of the neighbors of these 
pairs approach each either. We thus conjecture the 
following: 

Conjecture 6.4. Let kn be the maximum curvature 
between two trees with n-leaves. Then: 


*• < A^sPuin)-! ’ 

**• ArSPR{n)-l ■ 

Proving or disproving this conjecture would go a long 
way toward understanding the effects of relative dis¬ 
tance on curvature. However, we suspect that this will 
require a greater understanding of the distribution of 
tree neighborhoods with respect to one another than is 
currently known. Next, we bound the minimum curva¬ 
ture of two adjacent trees. 

Lemma 6.5. The curvature between adjacent trees with 
n leaves is at least 

—n^ 2n 
3.5n2 - I5n-f 16' 

We further observe that the limit of our curvature 
lower bound is — Complete enumeration with n < 7 
show that no pair of trees have curvature less than — | 
and our bound meets or exceeds this value for n > 7. 
Moreover, the rSPR distance is a metric, so this bounds 
the curvature for arbitrary pairs of trees (Proposition 19 
of [^). This directly leads to the following Corollary: 

Corollary 6.6. The curvature between two trees is at 
least — 

Note that this bound is not tight (at least for 
small n) as it is rarely necessary to transport mass 
the maximum distance between unpaired trees. We 
also note that the lower bounds in this section do 























not follow from the more general setting described in 
14 . However, the pair of trees used in the proof of 


Lemma |6.5| will always have negative curvature, for all 
n > 7. 

We next bound the difference between the coarse 
and asymptotic curvatures. Recall that Kp(T,S) is the 
coarse Ricci-Ollivier curvature between trees T and S 
with respect to the lazy walk that remains at a given 
tree with probability 1—p and moves with probability p. 
For the lazy uniform random walk, itit is now TUN{T), 
with each neighbor assigned mass and T assigned 

the remaining 1—p mass. The asymptotic Ricci-Ollivier 
curvature ric(T, S') is limp_j.o S)/p. As we now 
prove, these two notions of curvature differ only by 
a small factor inversely proportional to the maximum 
degree of T and S. 


Lemma 6.7. Let T and S be trees with n leaves. Then: 

i. ric(r, S) = k(T, S), ifdspniT.S) > 1, 
it. k{T,S) < ric(r,S) < k{T,S) + ^,,QNiT)\,\NiS}\) ’ 
*/ dspR{T,S) = 1 . 

Finally, we bound the difference between the curva¬ 
ture of the uniform random walk k(T, S) and that of the 
Metropolis-Hastings (MH) random walk k(MH; T,S). 
Recall that this random walk proposes a move from a 
tree T to a neighbor tree S uniformly at random and 
then accepts the move according to the Hastings ratio, 
which in this case is min ^1, ^. The mass distri¬ 

bution for the MH random walk thus leaves a portion 
of mass at the origin tree, proportional to the relative 
degree difference of its higher degree neighbors. Note 
that the same statement and proof of Lemma |6 . 7| holds 
with k{T, S) and ric(T, S) replaced by the MH curva¬ 
tures «;(MH;T, 5') and ric(MH; T, S'), respectively. 


Lemma 6.8. Let T and S be trees with n leaves. Then: 


hitting times for random walks. Moreover, we proved 
that rSPR graph degree changes depend quadratically 
on the product of the size of the regrafted subtree with 
its change in depth, as well as that the rSPR graph tends 
toward flatness with respect to rSPR moves that move 
asymptotically small subtrees. Finally, we proved that 
the coarse and asymptotic definitions of Ricci-Ollivier 
curvature are closely related with respect to uniform 
and Metropolis-Hastings walks on the rSPR graph. 

In this data-free setting the stationary distribution 
is, unlike with real data, quite evenly spread over all 
trees. Correspondingly, we found that the influence of 
curvature is small in this case (Fig. |5(a)| ) and that the 
probability of the target node in the stationary distribu¬ 
tion predominantly determines access times for pairs of 
trees (Fig. |5(b)'] ). However, it is well known that MCMC 
takes a long time to approximate real phylogenetic pos¬ 
terior distributions even when the Bayesian credible set 
is small, and in fact our previous work showed signih- 
cant SPR graph influence on the mixing time for phylo¬ 
genetic MCMC for credible sets that had tens, hundreds 
or thousands of trees 44 . Thus, our next step will be 


to investigate curvature of MCMC with nontrivial like¬ 
lihood functions, which will reduce the posterior distri¬ 
bution to a more realistic effective size, and in certain 
cases will lead to significant “bottlenecks” like those we 
have observed in real data. In those cases the curva¬ 
ture between two trees at either end of a bottleneck will 
describe how difficult it is to traverse the bottleneck. 

Now that we have established the foundations of 
using curvature to understand graphs relevant for phy¬ 
logenetic inference, many graph structures remain to be 
explored including NNI graphs, unrooted SPR graphs, 

graphs of BEAST rooted 


graphs of ranked trees 33 


“time-trees,” and random walks on other discrete struc¬ 
tures such as partitions 
trees. 


11 that can be expressed as 


k{MB.;T,S) < k{T,S) + and 

k(T, S') - 1/6 < k(MH; T, S) < k(T, S) -f 1/6. 
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7 Conclusion and future work 

In summary, we have gone beyond graph diameter and 
vertex degree to substantially advance understanding of 
the phylogenetic rSPR graph. We did so by developing 
the first theoretical and computational frameworks to 
bound and compute Ricci-Ollivier curvature of the 
rSPR graph. We found that curvature, along with 
degree and distance, determine the early dynamics of 
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A Supplementary Proofs 

Theorem 3.1. The subgraph of the rSPR graph in¬ 
duced by a set T of m trees with n leaves can be con¬ 
structed in 0{mn^)-time. 


Proof. The correctness of the procedure follows by 
induction on the number of trees already processed, i, 
by observing that the procedure has constructed the 
subgraph of vertices 1,2,...* and will construct the 
subgraph of vertices 1, 2,... * -|- 1. 

We implement the graph with an adjacency list rep¬ 
resentation with integer-labelled vertices that supports 
O(logn) edge insertions and lookups (with e.g. red- 
as the vertex degrees are 0(n^)). 


black trees 
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As 


described above, the integer labels are simply the order 
of the input trees. Adding the vertices to the graph re¬ 
quires 0(m)-time, as they are added in ascending order 
to the end of the vertex list, which can be stored as a 
fixed-size array. Adding the 0{mn^) edges to the graph 
requires 0(mn^ logn)-time. Enumerating the neighbors 


of Ti requires 0(n^)-time for each Ti, for a tot al of 
0(mn^)-time. We discuss below, in Section 3.2 how 
to do so efficiently without considering duplicate neigh¬ 
bors. We store the tree to index mappings for current 
vertices of G in a trie using Newick representation. 
This requires only 0(n)-time for each tree (i.e. a total of 
O(mn^)-time) using a standard nodes-and-pointers rep¬ 
resentation of the tree and assuming integer leaf labels 
(a simple O(mnlogn) leaf preprocessing step could be 
applied to extend this procedure to phylogenetic trees 
with string labels). Similarly, it takes 0(n)-time to de¬ 
termine the index of each of the 0{mn^) considered 
neighbors. Therefore the graph can be constructed in 
0(mn^)-time, as claimed. 


Lemma 3.2. For a tree T with n leaves, 

\N{T)\ = Y,\N{T,u)\, 

uGT 

for nodes u of T, where N{T,u) is as defined above, 
and: 

{ 2n — X — 5 if depth(u) > 1, 

2n — X — 3 if depth (u) = 1 ■ 

0 if depth(u) < 0 

Proof. The statement follows if each of the neighbor 
assignments are disjoint, that is N{T, u) n N{T, v) = 
0, for all nodes u, v of T. So, suppose, for the 
purpose of obtaining a contradiction, that there exist 
two nodes u and of T such that there exists a tree 
S G {N{T,u) n N{T,v)). Then S can be obtained 
from T by moving the subtrees rooted at u or v. Call 
these U and V, respectively. This implies that both 
T\U = S\U and T\V = S'\Eby the definition of 
an rSPR operation. Then the rSPR moves that move U 
or V to obtain S must be nearest neighbor interchanges 
(NNIs), that is, rSPR moves which move their subtree 
to one of four locations: their grandparent edge, aunt 
edge, sibling’s left child edge or sibling’s right child edge. 
This implies that, without loss of generality, U is moved 
to its grandparent edge and V to U's sibling (move type 
(hi)) or U is moved to its aunt edge and V to U’s edge 
(move type (iv)), a contradiction. Therefore the claim 
holds. 


Lemma 3.3. An rSPR neighbor of a tree T can be 
chosen uniformly at random in 0{n)-time using 0(n) 
space. 

Proof. We apply the above procedure. We use a 
standard nodes-and-pointers representation of the trees, 
which can be constructed in 0(n)-time from a Newick 
string representation and uses linear space in n. We 







can compute the degree of T in linear time and space 
using Lemma 3.2 To efficiently compute \N{T,u)\ for 
each node u of T, we require the number of nodes 
X in the subtree rooted at u. We pre-compute these 
by (1) labeling each node with its preorder number in 
a preorder traversal and (2) summing the number of 
descendant nodes in a postorder traversal and storing 
the results in an array indexed by preorder number. 
Both of these traversals require 0(n)-time. There 
are 2n — 1 = 0{n) nodes of T, and \N{T,u)\ can 
be computed in constant time using the subtree sizes. 
Moreover, the tree S can be found in 0(n)-time by 
iterating over the edges of T that are not contained 
within w’s subtree to select the corresponding rSPR 
destination. Finally, we require linear time to apply 
the chosen rSPR operation which entails removing a 
node, adding a node, and updating a constant number 
of pointers. Thus, the for loop requires linear time. 
By Lemma |3.2| the chosen tree is an rSPR neighbor 
of T and is chosen uniformly at random. Therefore, 
the procedure uses linear time and space and selects an 
rSPR neighbor of T uniformly at random. 


Lemma 5.2. Let T,S be trees with n > 3 leaves, and 
assume w.l.o.g. that |fV(r)| < |A^(S')|. Then: 


*• jwilyl ^ 3/4, and 
ii. |iV(S')| - |iV(T)| <n^-5n + 6. 


Proof. To prove (i), we simply note from Lemma 5.1 


that the ladder tree achieves the minimum degree, and 
the balanced tree achieves the maximum degree: 


|iV(r)| ^ 3n2-13n-bl4 

\N{S)\ - 4(n-2)2-2EriiLlog2("i+l)J 
^ 3n^ — 13n -I- 12 

- 4(n-2)2-2(n-2) 

3^2 - 13n -t 12 
“ 4n2 - 16n -t 16 - 2{n - 2) 

3n^ — 13n -|- 12 
“ 4n2 - 18n -t 20 
^ 3n^ — 13n -|- 12 
“ 4n2 - 17|n-b 18 


Vn > 3, 


which is greater than 3/4 when n > 3. Similarly for (ii): 

AN=\N{S)\- |1V(T)| 

n-2 

< (4(n-2)2-2^[log2(m-bl)J) 

m—1 

- (3n2 - 13n -b 14) 

< (4(n - 2)2 - 2(n - 2)) - - 13n -b 14) 

= 471,2 — 16n -b 16 — 2n -b 4 — 3n2 -b 13n — 14 

= 77,2 — 5n -b 6. 

Lemma 5.3. Let T and S be trees such that S can be 
obtained from T by moving a subtree R with k leaves 
from its position adjacent to subtree U to a location 
adjacent to subtree V. Let L be the LCA(U,V) in T. 
Let a be the number of intermediate nodes on the path 
from the parent of R to L in T, excluding endpoints. 
Similarly, let b be the number of intermediate nodes on 
the path from V to L in T, excluding endpoints. Let i be 
the number of leaves in U and j be the number of leaves 
in V, excluding any leaves of R. Then the degrees of T 
and S differ by: 

2 {k{a — b) + i — j). 

Proof. The set of permissible rSPR moves changes in 
four different ways due to the movement of R: (i) 
subtrees that include nodes on the path from U to L 
may now be moved into R and its newly introduced 
parent node, (ii) subtrees that include nodes on the 
path from V to L may no longer be moved into R and its 
parent node, (iii) R’s parent subtree may now be moved 
into U, and (iv) R’s parent subtree may no longer be 
moved into V. No additional moves are introduced or 
blocked by the original rSPR operation on R. 

Recall that a rooted tree with k leaves has 2{k — 1) 
internal edges(recall that we are excluding any “root 
edge” in these calculations). In the first case there are 
a subtrees that can now be moved onto the 2k edges in 
R (including its newly introduced parent edge and one 
of the newly subdivided root edges of V) for a total gain 
of 2ka distinct moves. Similarly, we lose 2kb moves in 
the second case. In the third case, R’s parent subtree 
may now make 2{i — 1) moves into U. Similarly, we lose 
2{j — 1) moves in the fourth case. 

Thus the difference in rSPR degree is 2ka — 2kb + 
2{i — 1) — 2{j — 1) as claimed. 

Corollary 5.4. Continuing with the setting and no¬ 
tation in Lemma [5751 at least 

7 := deg(T) — 2kb — 2{j — 1) = deg(S') — 2ka — 2{i — 1) 

trees in the neighborhood of T can be paired with o 
trees in the neighborhood of S such that the pairings 
are disjoint and dspB.(T', S') = 1 for each (T', S') pair. 











Proof. By the same arguments as in the proof of 
Lemma [5.3| 7 rSPR moves can be applied to T and S 
with the same source and target nodes. For each such 
(r', S') pair, we can move R in either tree to obtain the 
other member of the pair. 

Lemma 5.5. Let T, S be trees with n > leaves, s.t. 
|iV(r)| < |iV(S')| and dsPRiT,S) = 1. Then: 

|iV(5)| - |iV(r)| < 2L^J - 2)^ 

mIjI > i; > 4, and 

lii lim - 6 

ILL. — 'j ’ 

Proof. We first prove (i). By Lemma [5^ 1-^(5')| — 
|iV(T)| = 2{k{a — b) + i — j). This value is maximized by 
making L the root and minimizing b, namely by setting 
6 = 0. The resulting equation 2{ka + i — j) is similarly 
maximized by setting i = 1 (which allows us to increase 
a) then maximally balancing the terms in the product 
ka as follows. 

There are two cases, depending on whether the 
subtree of k leaves is moved to the root or not. If not, 
then we set j = 1 and split the remaining n — b — i — j = 
n — 2 leaves between k and a in as balanced a way 
as possible, giving (i). Note that this corresponds to 
moving the bottom subtree of or leaves in 

a ladder tree to the root-most leaf of the tree. 

If the subtree of k leaves is moved to the root, then 
we do not need to exclude the target branch from k and 
a, gaining an additional leaf to balance the product ka 
at the cost of increasing j. This corresponds to moving 
the bottom subtree of or leaves in a ladder 
tree to the root. Namely, we have 2{ka -I- 1 — j), where 
j = n — k = a + l. Let AA^ = |fV(S')| — |A(r)|. If we 
move the additional leaf, we have: 



like before. Similarly, if we do not move the additional 
leaf, we also have: 



proving (i). 

The relative change in degree, , can also be 

written as |w(ri|+(lw(5)HJV(r)|) - (t). we have that 

l«(S)|-|«mi < » gglj > 


This bound is minimized when |7V(r)| is minimized, and 


recall by Lemma 5.1 that |7V(T)| is bounded below by 


3n^ — 13n + 14. Thus 


\NiT)\ 

|iV(.S)| 


> 


3n^ — 13n -|-14 


> 


3n2-13n-kl4-t4(n-2)2 
3n2 — 13n -I- 14 


3.5n2 - 15n-t 16' 

Statements (ii) and (iii) follow from this bound. 

Lemma 5.6. Let T and S be trees such that 
dsPK{T, S) = 1. Then \N{T) n N{S)\ < 6 n - 17. 

Proof. T and S differ by one rSPR move that moves a 
subtree R. Pick a neighbor Lf S N{T) n N{S) of both 
T and S (this intersection is not empty: T and S are 
different, so R contains at most n — 2 of the leaves, 
thus there must be at least one other tree U obtained 
by moving R in T and S). Then either (i) T and U 
differ in the location of R, or (ii) T and U differ in 
the location of another subtree Q. In the latter case, 
T\{X\L{Q)) = S\{X\L{Q)) because T and S differ only 
in the location of R and dspR{T, U) = dspR{S, U) = 1. 
Then leaves r' £ R, q' £ Q, and u' S U, for some subtree 
U, form a triple of T and a different triple in S. This 
incompatible triple can be resolved in at most 6 n — 17 
ways, the maximum of which is reached when Q, U, and 


R are themselves a “triple” of subtrees. By Lemma 3.2 


each of the subtrees is assigned to at most 2 n — 6 unique 
moves. Moreover, one additional overlapping move also 
moves one of the subtrees (that of the aunt of the LCA 
of the three subtrees). The number of shared neighbors 
is thus at most 3(2n — 6 ) -I- 1 = 6 n — 17. Note that this 
bound is tight when, for example, T and S are ladders 
with a different configuration of 3 leaves at maximum 
depth. 


Theorem 6.1. Fix a positive integer k and let R be a 
tree with k leaves. Let {T„ | n > fc} 6 e a sequence 
of trees all containing R, and let {Sn | n > fc} 6 e the 
same sequence Tn but with R cut off and attached at a 
different location. Then lim„_>.oo k(T„, S'„) = 0 for the 
uniform random walk on the rSPR graph. 

Proof. Because fi(T„,S'„) = 1, we will prove the theo¬ 
rem by showing that the mass transport term Wi^n sits 
between two bounds, each of which has limit 1 as n goes 
to infinity. 

To start we demonstrate the theorem in the case 
that Tn and Sn have the same number of neighbors. 
First we claim that Wi^n is bounded above by (|iV(r„)|-|- 
0{kn)) /\N{Tn)\ by exhibiting a mass transport pro¬ 
gram satisfying that bound. Let {Tf, S'n) be any of the 7 









































pairs of neighbors of {Tn, Sn) which are one rSPR move 
apart as per Corollary |5.4| We pair these trees in the 
mass transport. There are O(fcn) trees unmatched by 
this pairing, and we can pair each of them arbitrarily 
with another tree of distance at most 3. Thus, lTi,„ is 
bounded above by (|iV(T„)| + 0(fcn))/|iV(T„)|. 

A lower bound is also available because we can’t 
do better than distance 1 for all trees except for shared 
neighbors, of which there are 0(n) by Lemma 5.6 By 


ignoring these trees we get a lower bound of (|A^(T„)| — 
(0(n)))/|fV(r„)| for Wi,„. 

The desired control of Wi^n is thus obtained because 
\N(Tn)\ is quadratic in n. 

Now we prove the theorem when the number 
of neighbors differ. Assume without loss of gen¬ 
erality that \N{Tn)\ < |A^(5'„)|. By Lemma |5.3[ 
|fV(5'„)| — |A^(r„)| = 2{k{a — b) + i — j), where each 
of {a,b,i,j} is less than n. Thus, |iV(S'„)| — \N(Tn)\ = 
0{kn). We again pair neighbor T!^ of T with neigh¬ 
bor S'^ of S such that dspR(T,^, 5'^) = 1 but, as 
|iV(T„)| < |A^(S'„)| we can only account for at most 
|iV(Tji)|/|7V(5'„)| of the mass directly and may have 
to move the (|A^(5'„)| — |A^(r„)|)/|A^(S'„)| remainder to 
trees a distance at most 3. Thus, Wi „ is bounded 
above by (|iV(T„)| + 0(M)/l^(^n)l = (l^(^n)l + 
0{kn))/\N{Sn)\- We again bound Wi^n from below with 
(|N(T„)| — 0(n))/|A^(T„)| by ignoring the mass in com¬ 
mon neighbors of Tn and Sn ■ The theorem again follows 
because |Af(r„)| is quadratic in n. 

Lemma 6.2. Let T and S be two trees. Then: 

—2 2 
< K(r, S) < 


dsvK{T, S) 


dsPR{T, S) 


Lemma 6.5. The curvature between adjacent trees with 
n leaves is at least 


—n^ -I- 2n 


3.5n2 - 15n-h 16' 


Proof. In light of Corollary |5.4[ the optimal mass trans¬ 
port cost is maximized (and therefore curvature mini¬ 
mized) across adjacent trees T and S' by a combination 
of two effects: trees that cannot be paired at distance 1 
and mass that must be moved between unpaired trees 
due to differing degrees of T and S. As we will show, 
these effects can be optimized simultaneously. To bound 
these effects, let m be the maximum (across T and S) 
proportion of mass that cannot be moved between ad¬ 
jacent neighbors of those trees. We can bound the mass 
transport cost from above by 1 -I- 2m because pairs of 
neighbors of adjacent trees are at most distance 3 apart. 
This gives a lower bound of 1 — (1 -I- 2m)/l) = —2m on 
the curvature. 

By Lemmas |5.3| and |5.5[ the latter effect is max¬ 
imized when the relative degree change is maximized. 
By Corollary 5.4 there are at most 7 := |A^(T)| — 
2ka — 2{i — 1 ) paired trees, bounding the former ef¬ 
fect. We now construct a pair of trees that maximizes 
both effects. Let S be the ladder tree with degree 
3n^ — 13n -|- 14 and T be the adjacent tree constructed 
by moving the lower leaves of S to the root. T 
has degree at most 3.5n^ — 15n -|- 16. There are thus 
2ka -I- 2 (z — 1 ) = 2 [f J + (1 — 1)) < — n un¬ 

paired neighbors, the maximum possible. Moreover, as 
shown by Lemma |5.3| this pair of trees obtains the max¬ 
imum (absolute and relative) degree change. Thus, the 
maximum m is: 


Proof. Observe that the distance between neighbors 
of T and S is bounded between dspniT, S) — 2 and 
dspniT, S) + 2. For the curvature upper bound, we then 
have k{T, S)<1- = rfsPR^T.S) ■ The lower 

bound follows similarly. 

Lemma 6.3. The maximum curvature between two ad¬ 
jacent trees with n leaves is ^n^-islI+iA ■ 

Proof. The maximum curvature between adjacent trees 
T and S occurs when their neighborhoods have maxi¬ 
mum overlap and all other tree pairs are at distance 1. 
By Lemma |5.6| the maximum overlap is 6n — 17. The 
amount of overlapping mass in the shared neighbors of 
T and S is thus niax{\N^T)]]N{S)\) • The minimum mass 
transfer cost is thus 1 — inax(|jv" 7 ^)njv(s)|) • This is min¬ 
imized when |A^(T)| = |A^(5')| are as small as possible, 
that is T, S are ladders and |Af('r)| = 3n^ — 13n -I- 14. 
The maximum curvature is thus 1 — 

|JV(T)|-(6rt-17) _ 6ri.-17 _ 6n-17 

\N{T)\ ~ |A(T)| “ 3n2_i3n-|-14- 


3.5n2 - 15n-h 16' 

The claim follows from multiplying this value by —2. 
Lemma 6.7. Let T and S be trees with n leaves. Then: 

i. T\c{T,S) = k{T,S), ifdspniT,S) > 1, 

ii. k{T,S) < ric(r,5) < k{T,S) + „,ax(|A(r)|,|Af(5)|) ^ 
*/ dspR{T, S) = 1. 

Proof. We first prove the lower bound in the uniform 
case, that is k{T,S) < ric{T,S). Let lTi(T, S') be the 
mass transport cost in the uniform case, and W{{T,S) 
be the same for the lazy uniform case with parameter 
p. Recall that k{T,S) = ki(T,S) = 1 — 

Kp{T, S)/p = (1 - / P- Observe that 

W; (T, S) < pWi (T, S) + (1 - p) dsPR{T, S ), 




















by the simple mass transport program obtained by 
treating the mass at T and S as separate from that 
of the neighbors. Then: 

>^p(T,S) _ ( W[{T,S) \ / 

P V dspK{T,S))/ 

pWiiT,S) + {l-p)dsPR{T,S) 
dsPB.{T, S) 

_1 Wi{T,S) 1-p 

P dspK{T, S) p 

^ Wi{T,S) 

dspp.iT, S) 

= «(T, S). 

For the upper bound, we observe that W{iT, S) > 
pWi{T, S) + il-p)dsPpiT,S) - \NiS)\) ’ 

as at most l/max(|A^(T)|, |7V(S')|) of the mass can 
remain at each of T and S, paired with the lazy 
remainder. The upper bound then follows analogously 
to the lower bound. Moreover, no mass can remain at T 
or S when dsppiT, S') > 1, in which case the curvatures 
are equal. 

Lemma 6.8. Let T and S be trees with n leaves. Then: 

k(MH; T, S) < k{T, S) + ^ 

k{T, S) - 1/6 < k(MH; T, S) < k(T, S) + 1/6. 

Proof. We first prove the lower bound. By Lemma |5.5[ 

the quotient of degrees for two adjacent trees > g) 

Thus, the Hastings ratio is always > |. This implies 

that at most i of the mass remains at tree T in 
6 

the mass distribution. Let Wi (T, S) be the cost of 
an optimal mass transport for the uniform random 
walk from T to S, and W/(T, S) the cost for the 
MH random walk. Moreover, let rriTiz) and msiw) 
be the mass assigned for the uniform random walk 
and m'rpiz) and m'giw) be the mass assigned for the 
MH random walk, for each vertex z S N{T) and 
w S NiS). We construct an upper bound on W{{T,S) 
by moving mass according to Wi where possible, and 
moving the remainder either from T to S, from T 
to a neighbor of S, or from a neighbor of T to S. 
That is, for each Wi assignment ^{z,w), we send 

Ciz,w) =^(z,u>)min(^))^, of the mass from 

z to w. The remaining f{z,w) — f'iz,w) of the mass 



> 1 


is moved from T to S, T to w, and z to S' in the re¬ 
spective proportions - 

Ciz,w), ^(^,^^^)min(0,^^ - 

The 

possible mass that is not moved according to Wi is g. 
Moreover, the affected mass must be moved through at 
most two additional trees. Then, W[ < Wi + |. We 
now have: 


and. 


maximum 


«:(MH; T, S) > 1 


dsppiT, S) 


> k(T, S) 


1 

^dsppiT, S) 


In the case that dspp{T,S) = 1, the affected mass 
must be moved through only at most one additional 
tree, as T and S are adjacent. We thus obtain the lower 
bound of K(r, S) — g in this case. 

We obtain the upper bounds similarly to the lower 
bounds, by observing that the affected at most g of the 
mass may move through at most two fewer trees (i.e. 
directly between T and S rather than a pair of neighbors 
at distance dsppiT, S')+ 2 from each other). Again, this 
is at most one fewer tree when dsppiT, S) = 1. 




















